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Abstract We present a method to analyse the scientific contributions be- 
. tween research groups. Given multiple research groups, we construct their 

journal/proceeding graphs and then compute the similarity /gap between 
them using network analysis. This analysis can be used for measuring sim- 
ilarity/gap of the topics/qualities between research groups' scientific con- 
tributions. We demonstrate the practicality of our method by comparing 
the scientific contributions by Korean researchers with those by the global 
researchers for information security in 2006 - 2008. The empirical analysis 
shows that the current security research in South Korea has been isolated 
, from the global research trend. 

o . 
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1 Introduction 
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■ For many areas, the evaluation of scientific contributions is a significant 

issue in the allocation of research funding and the assessment of the quality 
of research conducted by universities, institutes or countries. 

Peer review, where evaluation process is based on judgements formulated 
by independent experts, is commonly accepted as an ideal solution for this 
purpose since scientific contribution can be effectively evaluated by experts 
who are knowledgeable in the subject area being reviewed. Rankings and 
supporting qualitative evaluations by the experts can provide comparative 
information between research groups. However, despite its desirable effec- 
tiveness, peer review has a troublesome and challenging task in practice; this 
is how to assign unbiased and transparent experts. Surely, it is not trivial 
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to recruit peer review committees who are composed of specialists related 
to a particular subject on time and within a limited budget [36j. Moreover, 
we note that peer review is relatively slow and inefficient to reach a final 
decision. 

Alternatively, it has been tempting to use bibliometrics as simple and 
practical tools to assess scientific contribution. Bibliometric indicators such 
as the number of publications, journal impact factors, number of citations, 
and citation index can be readily available and also provide some meaning- 
ful information on the level of research productivity and scientific impact. 
Not surprisingly, it is really important to use a bibliometric database which 
is suitable for a purpose since these indicators can be greatly changed de- 
pending on the bibliometric database being used. 

The ISI bibliographic database, which includes the Arts and Human- 
ities Citation Index (A& HCI), Science Citation Index (SCI) and Social 
Sciences Citation Index (SSCI), has been used for decades as de facto stan- 
dard databases for conducting publication and citation analyses [2o|. How- 
ever, it is not desirable to view this as universal database regardless of the 
purpose. First of all, the coverage of the database is not complete accord- 
ing to subjects. Different research fields are covered unequally and only a 
few of conference proceedings and books, which are also important scientific 
literatures, are included in the database. Unlike the other fields such as nat- 
ural sciences and life sciences, prestigious conferences hosted by professional 
computer science societies such as ACM/IEEE are preferred to journals as 
a place to present original and important results ljj [H|- Moreover, some 



national journals, which are important in the social sciences and humani- 
ties, may not be considered since the databases have an English language 
bias [37[ . Lastly, although the database attempts to include the most impor- 
tant scientific literatures for a specific subject, it is difficult to estimate the 
only scientific contributions relevant to the specific subject since other un- 
related literatures are also included in the database. For example, suppose 
that we want to evaluate a research group's the scientific contributions to 
Russian history. The ISI bibliographic database is not proper for this pur- 
pose since some relevant (Russian) literatures may not be included in the 
database, whereas unnecessary literatures can be included. Our study is mo- 
tivated by this limitation of the dependency on the bibliographic database. 

Our goal is to design a research evaluation method, which can compare 
the scientific contributions of research groups directly, without a specific 
bibliographic database. We propose how to compare the scientific contribu- 
tions of research groups, inspired by recent advances in complex network 
analysis. This analysis can be a good alternative to the peer review or 
the conventional bibliometric indicators since we can compare the scientific 
contributions of a given research group with well-known experts' scientific 
contributions. In this paper, we make the following two contributions. 

— We propose an analysis method to measure the similarity /gap between 
research groups by comparing their publication patterns in Section O 
For comparison of publication patterns, we construct the relationship 
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graphs on their publications and then analyse the relevance between 
the constructed graphs. We suggest the metrics to measure the similar- 
ity/gap between the research groups' publication outputs. This method 
is useful to see how much close to the research mainstream in a specific 
field. 

— As a practical application, we compare the publication outputs of South 
Korea with those of the global researchers for information security dur- 
ing the period 2006 - 2008 in Section 03 Our main results are shown in 
Table [3] and [4] in Section 13^21 The experimental result shows that as sus- 
pected, Korean security researchers have been somewhat isolated from 
the mainstream. 

Although the proposed measurement does not mean the research qual- 
ity of the scientific contributions from a research group, this analysis can 
measure how much the publication outputs of a research groups is similar to 
those of another research group. Consequently, it can be applied to a useful 
supplement for research evaluation or trend analysis. 

2 The proposed method 

Our goal is to analyse the similarity /gap between the scientific contributions 
of the multiple research groups. Firstly, we construct each research group's 
journal/proceeding graph using their publication outputs and then analyse 
the similarity /gap between them by comparing the constructed graphs. 

If we compare the sets of researchers with different cardinalities, appro- 
priate normalization is required. For simplicity, we assume that all the sets 
being compared have the same cardinality. 

2.1 Construction of journal /proceeding graphs 

Given a set of researchers R, we construct the journal/proceeding graph 
by taking the following steps: 

1. For each researcher a G R, collect the a's publication outputs within a 
time window (e.g. within 2008). 

2. Generate the bipartite graph Gr with these collected publication data, 
whose nodes are divided into a set of authors A and a set of jour- 
nals/proceedings J and an edge (a, j) means that the author a published 
a paper in the journal (or proceeding) j for a G A and j G J. 

3. Construct the J-projected graph compressed by J-projection, which 
is a well-known technique so-called one-mode projecting to show the re- 
lations among a particular set of nodes only [12|, [39[. The J-projection 
means a network containing only nodes in J, where two nodes are con- 
nected when they have at least one common author. The weight of each 
edge is computed as l/(the number of the shared authors). 
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The constructed journal/proceeding graph may give the information 
about not only a set of topologically popular journals/proceedings for a 
research group but also the relative importance of them by computing their 
centrality metric values such as degree, closeness and betweenness. We de- 
scribe the definition and meaning of the metrics in Appendix [Al We denote 
"ra-central nodes" the set of nodes of which m metric values are greater 
than the average value of the graph. Consequently, we can identify the rel- 
atively important journals (or proceedings) in a journal/proceeding graph 
by observing the m-central nodes in the graph. 

For an example, suppose that we have two researchers: R = {ai,<22}. 
When the researchers a\ and 0,2 published their papers in the journals 
ji, J2, J3 and j2, J3, J4, respectively, then we have a bipartite graph as shown 
in Figure [T] (a). From the bipartite graph, we can construct the J-projected 
graph as in Figured] (b). In our projecting method, the weight of each 




(a) A bipartite graph (b) A J-projected graph 

Fig. 1 An example of journal/proceeding graph construction 

edge is assigned to be inversely proportional to the number of the shared au- 
thors between two journals (or proceedings) to represent how to close them 
so that the weight of the edge (72, J3) is 0.5 since a\ and (12 are commonly 
published their papers in both journals 22 and j'3. From this graph, we can 
identify {j'2, J3} as degree-central nodes of which degree values are greater 
than 2. 

However, it is still rather difficult to explain the similarity /gap between 
the graphs although the nodes' centrality values show their relative im- 
portance for a research group. Therefore we need to define the metrics to 
measure the similarity /gap between the graphs quantitatively. 

2.2 Comparison of the journal /proceeding graphs 

We analyse the similarity /gap between the journal/proceeding graphs con- 
structed in Section I2TT1 For this purpose, we suggest the functions to mea- 
sure the similarity /gap between networks explicitly. We classified these func- 
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tions into two types: "the fraction of overlapping nodes/interactions" for 
similarity and "the distance between the graphs" for gap. 

Given k J-projected graphs = (Vi, Ei), • • • , = (V&, the sym- 
bols Vjj and Va represent the superset of nodes in the graphs (Ui=i ^) an d 
the set of common nodes between the graph (C\i=i respectively. 



2.2.1 Fraction of overlapping nodes /interactions To measure the similarity 
between the journal/proceeding graphs, we can simply count the number of 
common nodes or edges. The first metric is to compute the ratio of common 
nodes between the J-projected graphs. We define this as follows: 

Definition 1 Ratio of common nodes- Given k J-projected graphs Gf = 
(Vi, Ei), • • • , G^ = (Vk,Ek), we then define u the ratio of common nodes" 
as 

**>node{^*i , • m ' i ^*k) — 



The second metric is to compute the ratio of common interactions be- 
tween the J-projected graphs. Unlike Definition [TJ however, we cannot sim- 
ply achieve this goal since such a notion does not account for interactions 
of non-existing edges and is therefore not a comprehensive view of the in- 
teractions between the graphs. 

To represent non-existing edges, we define the function : Vjj x Vjj — > 
{0,1} by 

J 1, if (u,v) e Ei , 
ei(u, v) = < 

I 0, Otherwise . 

Likewise, we define the function e$ as the negation of e$. Finally, we define 
the ratio of common interactions between the graphs with the functions e^ 
and e~i as follows: 

Definition 2 Ratio of common interactions- Given k J-projected graphs 
Gf = (Vi, Ei), • • • , G^ = (Vfc, Ek), we denote n the cardinality of Vjj, and 
then define u the ratio of common interactions" as 



~Rinteraction(Gi, "' , G*) = _ ^ ( Ci(u, v) + e»(lX, v) 

^ ' u,veVu 




These metrics explain how much network structures (e.g. interaction 
patterns) are similar. 
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2.2.2 Distance between the graphs In general, the computation of the com- 
mon parts between graphs may not be suitable for comparison of jour- 
nal/proceeding graphs since they have a few common nodes and edges. 
As more sophisticated measures, we can consider the distance between the 
nodes in the J-projected graphs. The first metric is to compute the distance 
between the common nodes and the other nodes of the J-projected graphs. 
We define this as follows: 

Definition 3 Closeness of common nodes- Given k J-projected graphs 
Gf = (Vi, E\), • • • , = (Vk,Ek), we define "the closeness of common 
nodes" as follows: 



This metric measures how close all other journals/proceedings in the 
network are located from their common journal/proceedings. We can explain 
how much closer a node in each graph to the common nodes between the 
graphs on the average using this value. This value will be exactly if and 
only if Va is the same as Vjj- 

For some applications, it is also important to observe the diversity of 
journals/proceedings between researchers. Basically, this property is closely 
related to the network diamete^ of a J-projected graph. Therefore we need 
to measure how many the network diameter of the union graph G^ = 
(Vu, Eu) is increased after combining all the J-projected graphs where Ejj = 
Ui=i We compute the average increasing size of the union graph G^ as 



Definition 4 Average increasing diameter- Given k J-projected graphs 
Gf = (Vi, Ei), • • • , G^ = (Vk,Ek), we define u the average increasing diam- 
eter" as follows: 

1 k 

AD(G?, • • • , Gi) = - • y^(diameter(Gu) - diameter(Gi)) 



3 An example 

We demonstrate the practicality of our method by comparing the scientific 
contributions by Korean researchers with those by the global researchers for 
information security in 2006 - 2008. 

Our goal is to show how much closer the scientific contributions by Ko- 
rean researchers to the research mainstream by comparing their publication 
outputs with the well known global researchers' results. As an example, we 



common 1 





Otherwise . 



follows: 



1 Network diameter is the maximum distance between nodes in the network [3l|] 
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\A\ 


\J\ 


tt journals/proceedings 


2006 (Korean) 


14 


40 


77 


2007 (Korean) 


16 


43 


61 


2008 (Korean) 


14 


37 


51 


2006 (Global) 


14 


58 


73 


2007 (Global) 


16 


46 


59 


2008 (Global) 


17 


55 


68 



Table 1 Summary of publication data: \A\ and \J\ represent the number (cardi- 
nality) of the authors and their publications, respectively. We have \A\ < 20 since 
we plotted only authors who has at least one publication. 



analyse security research in South Korea from 2006 to 2008. We use a sam- 
ple set since it is practically infeasible to collect all publications related to 
security. To obtain a reasonable sample, we perform the following two steps: 

1. Select top conferences related to security field and held in South Korea. 

2. Randomly select n Korean researchers from the program committee 
members of the selected conferences in South Korea. 

In selecting conferences, some prior knowledge is required. We select two 
conferences, "International Workshop on Information Security Applications" 
(WISA) [30[ and "International Conference on Information Security and 
Cryptology" (ICISC) [29], on the basis of their large scale and long his- 
tory compared to other conferences. Also, we define the sample size as 20 
(n = 20). We assume that 20 active researchers are enough to show the 
characteristics or trend. Let T be a set of randomly selected researchers 
from the program committee members of these conferences. 

In the similar manner, we obtain a reasonable sample set of global re- 
searchers by using the top international conferences for security, "IEEE 
Symposium on Security and Privacy" , "ACM Conference on Computer and 
Communications Security" and "Usenix Security Symposium". These con- 
ferences are selected under the conference ranking of well-known web sites 
H, [H, Q • Let P be a set of randomly selected researchers from the pro- 



gram committee members of these conferences. 

We collect T's and P's publication results from 2006 to 2008, respec- 
tively. For simplicity, we only consider the bibliographic information indexed 
by the Digital Bibliography & Library Project (DBLP) [l7| under the as- 
sumption that this database provides the most bibliographic information on 
major computer science journals and proceedings. 

With the collected publication data, we construct the bipartite graphs 
for each year and each research group from 2006 to 2008. From these bipar- 
tite graphs, we analyse the basic network properties which we summarize in 
Table [U Since some researchers in T and P do not have any publications in 
the DBLP database during 2006 - 2008, we can only draw between 14 and 
17 authors who have at least one publication in the related year among 20 
sampled researchers. 
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3.1 Journal/proceeding graphs 

By J-projection in Section 12. 1[ we construct J-projected graph^] from the 
bipartite graphs. The resulting graphs are shown in FigureO In Figure[2j the 
size of each node is increased to be linearly proportional to the node's degree 
and the acronyms of journals/proceedings (We provide a supplementary 
material to introduce the full name of journals/proceeding^) are used as 
nodes' identifiers. 

We summarize several basic network properties of each J-projected graph 
in Table O We observe that the network size of Korean researchers' J- 
projected graphs appear to be approximately decreasing from year to year. 





(t nodes 


tt edges 


average 
distance 


diameter 


2006 (Korean) 


39 


180 


2.000 


5 


2007 (Korean) 


40 


128 


2.103 


4 


2008 (Korean) 


30 


99 


1.993 


3 


2006 (Global) 


58 


181 


3.019 


6 


2007 (Global) 


38 


123 


2.077 


4 


2008 (Global) 


44 


155 


2.297 


4 



Table 2 J-projected graphs' properties 



As we discussed in Section 12. If we can interpret the relative prominence 
of journals/proceedings embedded in the graphs by computing centrality 
metric values such as degree, closeness and betweenness. We can identify the 
m-central nodes for each research group from Figures [3j [4] and [5] in Appendix 
iBl While we can see that each research group's central journals/proceedings 
have not changed very much over time, there is almost no common central 
node between "Korean researchers'" and "Global researchers'" graphs. In 
particular, "IEICE(J)" and "CCS" is the key journal (or proceeding) for 
"Korean research group" and "Global research group" , respectively. 

3.2 Comparison of two research groups 

First of all, we measure the metrics of overlapping nodes/interactions. The 
results are shown in Table [3j While all "the ratios of the common nodes" 
are under 10%, all "the ratios of the common interactions" are higher than 
the 45%. We note that "the ratios of the common interactions" is not a 

2 Without loss of generality, in the case of a disconnected graph, we only con- 
sider the largest connected component in the graph since it is commonly believed 
that the largest component is most meaningful in practice. 

3 In the acronyms, (J) means a journal article. 
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(e) Korean researchers (2008) (f) Global researchers (2008) 

Fig. 2 J-projected graphs from 2006 to 2008 





Common nodes 




^^interaction 


2006 
2007 
2008 


ACNS, FC, SDM 
ACIS, ACNS, ICISS, JNCA(J), Pairing 
CCS, ESORICS, JUCS(J), TISSEC(J), WCNC 


0.032 
0.068 
0.072 


0.472 
0.472 
0.466 



Table 3 Overlapping nodes/edges between the J-projected graphs 



meaningful metric in this application since the graphs are too sparse and 
there is no common edge between two research groups' journal/proceeding 
graphs. 



10 



Hyoungshick Kim, Ji Won Yoon 



For measuring the gap between these graphs, we compute the metric 
values in Section [2.2.21 The results are shown in Table [U 





Ccommon 


AD 


2006 


1.709 


1.5 


2007 


1.288 


2.0 


2008 


1.105 


0.5 



Table 4 Distance between the J-projected graphs 



From Table [4j we can see that the distance between the common nodes 
and the other nodes of the J-projected graphs is continuously decreased from 
year to year. We can also see that AD is nearly close to since the network 
diameter of the union graph Gfj is decreased to 4 in 2008. This shows that 
the journals/proceedings which the Korean researchers submitted to do not 
deviate much from the research mainstream. 



3.3 Discussion 

We compared the journals/proceedings that Korean researchers have fo- 
cused on with those that global researchers have focused on by projecting 
the bipartite graphs into projected graphs. In Table [3] and [H we found 
that the Korean and global research groups share only a small fraction of 
journals/proceedings and their journal/proceeding graphs have somewhat 
different structures. That is, Korean researchers and global researchers are 
publishing their papers in different journals or conferences even though they 
are working in the same subject. Under the assumption that a global re- 
search group is close to the ideal research group, we claim that the Korean 
research group will have to exert itself more than it currently does to pub- 
lish many papers in journals/proceedings with high centrality (e.g. CCS, 
WPES, USENIX and ACSAC) in the global researchers' graphs as shown 
in Figures El [H and [5] in Appendix iBl However, the metrics in Table [3] and 
|4]also show that Korean security researchers' publication pattern in 2008 is 
somewhat close to the mainstream compared to that in 2006. 

Our work is primarily intended to demonstrate how to compare publica- 
tion patterns between the research groups. We have not considered research 
quality since the results of our metrics may not give enough evidence to 
compare quality between two groups (although we can guess). The pro- 
posed analysis of publication pattern can be, however, a useful supplement 
rather than a replacement for traditional research evaluation methods. 

In addition conference (or journal) selection is strongly related to geo- 
graphical and political factors in the real world. In this paper, we do not 
consider these factors. 
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4 Related work 



The use of statistical bibliometric indicators in research evaluation emerged 
in the 1960s and 1970s [3], and is in wide use today due to the development 
of the relevant databases. These indicators provide useful output measures 
of activity and performance in scientific research and have become standard 
tools for research evaluation [l|. However, some methodological problems 
of research evaluation at the micro level (e.g. the scientific contribution of 
a small research group) still remain unresolved [H, [22j]. Meyer et al. [2l[ 
issued the problems of the bibliometric indicators for computer science in 
detail. 

An alternative approach is to analyse researchers' social networks such as 
co-citation networks [l, H, 0, Q and co-authorship networks 24|, EE EH 0] • 
Citation networks can be also used to evaluate the importance of jour- 
nals/proceedings by computing centrality values of the nodes in a citation 
graph. Co-authorship networks are an important class of social networks 
and have been used extensively. Many co-authorship networks have been 
studied to investigate the patterns, motivation, and the structure of scien- 
tific collaboration [1, 0, El, \MM EE EE EH - Morris Q proposed a model to 
monitor the birth and development of a scientific speciality with a collection 
of journal papers. Lee [16] practically analysed the research trends in the 
information security field using "co-word analysis". Our work is to extend 
these to measure the similarity /gap between research groups by comparing 
their publication outputs. 



5 Conclusion 



We have presented a set of metrics to compare research groups' publication 
outputs and have shown how they can be applied effectively to measure 
the similarity /gap between them. For example, our proposed method can 
explain a research group's connectedness to the research mainstream, both 
statically and over time. We showed the similarity /gap between the pub- 
lication patterns of Korean researchers and global in information security 
from 2006 to 2008. The experimental results show that as suspected, Korean 
security researchers have been somewhat isolated from the mainstream. 

Our approach has a lot of potential. First of all, it can show the dy- 
namics of publication trend in a given research group by comparing their 
scientific productions periodically. Also, we can explain the similarity /gap 
between the intended research group's the scientific contributions and the 
world leaders' those in a field. 
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Appendix 

A Centrality metrics 

A.l Degree 

In a J-projected graph, the degree of a node approximately measures how 
many authors frequently publishes articles in the node (journal or proceed- 
ing) since the adjacent edge of the node means that an author published at 
least an article in this node. 

A. 2 Closeness 

However, degree has a shortcoming since it only takes into account the 
immediate edges that a node has, rather than edges to all others. Moreover, 
degree do not capture the characteristics of weighted graphs. Therefore we 
additionally consider closeness which focuses on the geodesic distance of 
a node to all others in the network. The closeness of a node v, c(v), is 
computed as follows [271 ] : 



Xluev distance(v, u) 

Closeness centrality focuses on the extensibility of influence over the entire 
network. In a J-projected graph, Closeness measures how close all other 
journals/proceedings in the network are located from a given journal (or 
proceeding). 

A. 3 Betweenness 

The other important centrality measure is betweenness. Let a st denote the 
number of the shortest paths from s G V to t G V where a ss = 1. Let cr st (v) 
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denote the number of shortest paths from s G V to t G V passing through 
v G V. The betweenness of a node v, b(v), is computed as follows [Til I3|: 

Betweenness is a measure of the extent to which a node lies on the paths be- 
tween others. This measure favours nodes that join communities (dense sub- 
networks), rather than nodes that lie inside a community. In a J-projected 
graph, journals/proceedings with high betweeness are connectors between 
separate journals/proceedings groups (depending on levels or topics). 
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B Central journals/proceedings in the example 




(a) Korean researchers (2006) 




(c) Korean researchers (2007) 
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(b) Global researchers (2006) 





\ 






t 











(d) Global researchers (2007) 




(e) Korean researchers (2008) (f) Global researchers (2008) 

Fig. 3 degree-central nodes in the projected graphs 
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(a) Korean researchers (2006) (b) Global researchers (2006) 



(c) Korean researchers (2007) (d) Global researchers (2007) 
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(e) Korean researchers (2008) (f) Global researchers (2008) 

Fig. 4 c/oseness-central nodes in the projected graphs 
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(e) Korean researchers (2008) (f) Global researchers (2008) 

Fig. 5 betweenness-central nodes in the projected graphs 



